Pesquisa | Portal Regional da BVS

1.

Molecular robotic agents that survey molecular landscapes for information retrieval.

Woo, Sungwook; Saka, Sinem K; Xuan, Feng; Yin, Peng.

Nat Commun ; 15(1): 3293, 2024 Apr 17.

Artigo em Inglês | MEDLINE | ID: mdl-38632239

RESUMO

DNA-based artificial motors have allowed the recapitulation of biological functions and the creation of new features. Here, we present a molecular robotic system that surveys molecular environments and reports spatial information in an autonomous and repeated manner. A group of molecular agents, termed 'crawlers', roam around and copy information from DNA-labeled targets, generating records that reflect their trajectories. Based on a mechanism that allows random crawling, we show that our system is capable of counting the number of subunits in example molecular complexes. Our system can also detect multivalent proximities by generating concatenated records from multiple local interactions. We demonstrate this capability by distinguishing colocalization patterns of three proteins inside fixed cells under different conditions. These mechanisms for examining molecular landscapes may serve as a basis towards creating large-scale detailed molecular interaction maps inside the cell with nanoscale resolution.

Assuntos

Procedimentos Cirúrgicos Robóticos , DNA , Proteínas , Fenômenos Biofísicos , Armazenamento e Recuperação da Informação

2.

Evaluating Linkage Quality of Population-Based Administrative Data for Health Service Research.

Kim, Ji-Woo; Choi, Hyojung; Lim, Hyun Jeung; Oh, Miae; Ahn, Jae Joon.

J Korean Med Sci ; 39(14): e127, 2024 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-38622936

RESUMO

BACKGROUND: To overcome the limitations of relying on data from a single institution, many researchers have studied data linkage methodologies. Data linkage includes errors owing to legal issues surrounding personal information and technical issues related to data processing. Linkage errors affect selection bias, and external and internal validity. Therefore, quality verification for each connection method with adherence to personal information protection is an important issue. This study evaluated the linkage quality of linked data and analyzed the potential bias resulting from linkage errors. METHODS: This study analyzed claims data submitted to the Health Insurance Review and Assessment Service (HIRA DATA). The linkage errors of the two deterministic linkage methods were evaluated based on the use of the match key. The first deterministic linkage uses a unique identification number, and the second deterministic linkage uses the name, gender, and date of birth as a set of partial identifiers. The linkage error included in this deterministic linkage method was compared with the absolute standardized difference (ASD) of Cohen's according to the baseline characteristics, and the linkage quality was evaluated through the following indicators: linked rate, false match rate, missed match rate, positive predictive value, sensitivity, specificity, and F1-score. RESULTS: For the deterministic linkage method that used the name, gender, and date of birth as a set of partial identifiers, the true match rate was 83.5 and the missed match rate was 16.5. Although there was bias in some characteristics of the data, most of the ASD values were less than 0.1, with no case greater than 0.5. Therefore, it is difficult to determine whether linked data constructed with deterministic linkages have substantial differences. CONCLUSION: This study confirms the possibility of building health and medical data at the national level as the first data linkage quality verification study using big data from the HIRA. Analyzing the quality of linkages is crucial for comprehending linkage errors and generating reliable analytical outcomes. Linkers should increase the reliability of linked data by providing linkage error-related information to researchers. The results of this study will serve as reference data to increase the reliability of multicenter data linkage studies.

Assuntos

Armazenamento e Recuperação da Informação , Registro Médico Coordenado , Humanos , Reprodutibilidade dos Testes , Registro Médico Coordenado/métodos , Valor Preditivo dos Testes , Serviços de Saúde

3.

Biomedical semantic text summarizer.

Kirmani, Mahira; Kour, Gagandeep; Mohd, Mudasir; Sheikh, Nasrullah; Khan, Dawood Ashraf; Maqbool, Zahid; Wani, Mohsin Altaf; Wani, Abid Hussain.

BMC Bioinformatics ; 25(1): 152, 2024 Apr 16.

Artigo em Inglês | MEDLINE | ID: mdl-38627652

RESUMO

BACKGROUND: Text summarization is a challenging problem in Natural Language Processing, which involves condensing the content of textual documents without losing their overall meaning and information content, In the domain of bio-medical research, summaries are critical for efficient data analysis and information retrieval. While several bio-medical text summarizers exist in the literature, they often miss out on an essential text aspect: text semantics. RESULTS: This paper proposes a novel extractive summarizer that preserves text semantics by utilizing bio-semantic models. We evaluate our approach using ROUGE on a standard dataset and compare it with three state-of-the-art summarizers. Our results show that our approach outperforms existing summarizers. CONCLUSION: The usage of semantics can improve summarizer performance and lead to better summaries. Our summarizer has the potential to aid in efficient data analysis and information retrieval in the field of biomedical research.

Assuntos

Algoritmos , Pesquisa Biomédica , Semântica , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural

4.

Remote sensing image information extraction based on Compensated Fuzzy Neural Network and big data analytics.

Sun, Rui; Zhang, Zhengyin; Liu, Yajun; Niu, Xiaohang; Yuan, Jie.

BMC Med Imaging ; 24(1): 86, 2024 Apr 10.

Artigo em Inglês | MEDLINE | ID: mdl-38600525

RESUMO

Medical imaging AI systems and big data analytics have attracted much attention from researchers of industry and academia. The application of medical imaging AI systems and big data analytics play an important role in the technology of content based remote sensing (CBRS) development. Environmental data, information, and analysis have been produced promptly using remote sensing (RS). The method for creating a useful digital map from an image data set is called image information extraction. Image information extraction depends on target recognition (shape and color). For low-level image attributes like texture, Classifier-based Retrieval(CR) techniques are ineffective since they categorize the input images and only return images from the determined classes of RS. The issues mentioned earlier cannot be handled by the existing expertise based on a keyword/metadata remote sensing data service model. To get over these restrictions, Fuzzy Class Membership-based Image Extraction (FCMIE), a technology developed for Content-Based Remote Sensing (CBRS), is suggested. The compensation fuzzy neural network (CFNN) is used to calculate the category label and fuzzy category membership of the query image. Use a basic and balanced weighted distance metric. Feature information extraction (FIE) enhances remote sensing image processing and autonomous information retrieval of visual content based on time-frequency meaning, such as color, texture and shape attributes of images. Hierarchical nested structure and cyclic similarity measure produce faster queries when searching. The experiment's findings indicate that applying the proposed model can have favorable outcomes for assessment measures, including Ratio of Coverage, average means precision, recall, and efficiency retrieval that are attained more effectively than the existing CR model. In the areas of feature tracking, climate forecasting, background noise reduction, and simulating nonlinear functional behaviors, CFNN has a wide range of RS applications. The proposed method CFNN-FCMIE achieves a minimum range of 4-5% for all three feature vectors, sample mean and comparison precision-recall ratio, which gives better results than the existing classifier-based retrieval model. This work provides an important reference for medical imaging artificial intelligence system and big data analysis.

Assuntos

Inteligência Artificial , Tecnologia de Sensoriamento Remoto , Humanos , Ciência de Dados , Armazenamento e Recuperação da Informação , Redes Neurais de Computação

5.

nhanesA: achieving transparency and reproducibility in NHANES research.

Ale, Laha; Gentleman, Robert; Sonmez, Teresa Filshtein; Sarkar, Deepayan; Endres, Christopher.

Database (Oxford) ; 20242024 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-38625809

RESUMO

The National Health and Nutrition Examination Survey provides comprehensive data on demographics, sociology, health and nutrition. Conducted in 2-year cycles since 1999, most of its data are publicly accessible, making it pivotal for research areas like studying social determinants of health or tracking trends in health metrics such as obesity or diabetes. Assembling the data and analyzing it presents a number of technical and analytic challenges. This paper introduces the nhanesA R package, which is designed to assist researchers in data retrieval and analysis and to enable the sharing and extension of prior research efforts. We believe that fostering community-driven activity in data reproducibility and sharing of analytic methods will greatly benefit the scientific community and propel scientific advancements. Database URL: https://github.com/cjendres1/nhanes.

Assuntos

Armazenamento e Recuperação da Informação , Inquéritos Nutricionais , Reprodutibilidade dos Testes , Bases de Dados Factuais

6.

Securing cloud data using secret key 4 optimization algorithm (SK4OA) with a non-linearity run time trend.

Frimpong, Twum; Hayfron Acquah, James Benjamin; Missah, Yaw Marfo; Dawson, John Kwao; Ayawli, Ben Beklisi Kwame; Baah, Philemon; Sam, Samuel Akyeramfo.

PLoS One ; 19(4): e0301760, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38625954

RESUMO

Cloud computing alludes to the on-demand availability of personal computer framework resources, primarily information storage and processing power, without the customer's direct personal involvement. Cloud computing has developed dramatically among many organizations due to its benefits such as cost savings, resource pooling, broad network access, and ease of management; nonetheless, security has been a major concern. Researchers have proposed several cryptographic methods to offer cloud data security; however, their execution times are linear and longer. A Security Key 4 Optimization Algorithm (SK4OA) with a non-linear run time is proposed in this paper. The secret key of SK4OA determines the run time rather than the size of the data as such is able to transmit large volumes of data with minimal bandwidth and able to resist security attacks like brute force since its execution timings are unpredictable. A data set from Kaggle was used to determine the algorithm's mean and standard deviation after thirty (30) times of execution. Data sizes of 3KB, 5KB, 8KB, 12KB, and 16 KB were used in this study. There was an empirical analysis done against RC4, Salsa20, and Chacha20 based on encryption time, decryption time, throughput and memory utilization. The analysis showed that SK4OA generated lowest mean non-linear run time of 5.545±2.785 when 16KB of data was executed. Additionally, SK4OA's standard deviation was greater, indicating that the observed data varied far from the mean. However, RC4, Salsa20, and Chacha20 showed smaller standard deviations making them more clustered around the mean resulting in predictable run times.

Assuntos

Algoritmos , Armazenamento e Recuperação da Informação , Computação em Nuvem , Segurança Computacional , Microcomputadores

7.

Efficient DNA-based data storage using shortmer combinatorial encoding.

Preuss, Inbal; Rosenberg, Michael; Yakhini, Zohar; Anavy, Leon.

Sci Rep ; 14(1): 7731, 2024 04 02.

Artigo em Inglês | MEDLINE | ID: mdl-38565928

RESUMO

Data storage in DNA has recently emerged as a promising archival solution, offering space-efficient and long-lasting digital storage solutions. Recent studies suggest leveraging the inherent redundancy of synthesis and sequencing technologies by using composite DNA alphabets. A major challenge of this approach involves the noisy inference process, obstructing large composite alphabets. This paper introduces a novel approach for DNA-based data storage, offering, in some implementations, a 6.5-fold increase in logical density over standard DNA-based storage systems, with near-zero reconstruction error. Combinatorial DNA encoding uses a set of clearly distinguishable DNA shortmers to construct large combinatorial alphabets, where each letter consists of a subset of shortmers. We formally define various combinatorial encoding schemes and investigate their theoretical properties. These include information density and reconstruction probabilities, as well as required synthesis and sequencing multiplicities. We then propose an end-to-end design for a combinatorial DNA-based data storage system, including encoding schemes, two-dimensional (2D) error correction codes, and reconstruction algorithms, under different error regimes. We performed simulations and show, for example, that the use of 2D Reed-Solomon error correction has significantly improved reconstruction rates. We validated our approach by constructing two combinatorial sequences using Gibson assembly, imitating a 4-cycle combinatorial synthesis process. We confirmed the successful reconstruction, and established the robustness of our approach for different error types. Subsampling experiments supported the important role of sampling rate and its effect on the overall performance. Our work demonstrates the potential of combinatorial shortmer encoding for DNA-based data storage and describes some theoretical research questions and technical challenges. Combining combinatorial principles with error-correcting strategies, and investing in the development of DNA synthesis technologies that efficiently support combinatorial synthesis, can pave the way to efficient, error-resilient DNA-based storage solutions.

Assuntos

Replicação do DNA , DNA , Análise de Sequência de DNA/métodos , DNA/genética , Algoritmos , Armazenamento e Recuperação da Informação

8.

A novel Dual-Branch Asymmetric Encoder-Decoder Segmentation Network for accurate colonic crypt segmentation.

Zhou, Jingjun; Xiong, Hong; Liu, Qian.

Comput Biol Med ; 173: 108354, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38522251

RESUMO

Colorectal cancer (CRC) is a leading cause of cancer-related deaths, with colonic crypts (CC) being crucial in its development. Accurate segmentation of CC is essential for decisions CRC and developing diagnostic strategies. However, colonic crypts' blurred boundaries and morphological diversity bring substantial challenges for automatic segmentation. To mitigate this problem, we proposed the Dual-Branch Asymmetric Encoder-Decoder Segmentation Network (DAUNet), a novel and efficient model tailored for confocal laser endomicroscopy (CLE) CC images. In DAUNet, we crafted a dual-branch feature extraction module (DFEM), employing Focus operations and dense depth-wise separable convolution (DDSC) to extract multiscale features, boosting semantic understanding and coping with the morphological diversity of CC. We also introduced the feature fusion guided module (FFGM) to adaptively combine features from both branches using cross-group spatial and channel attention to improve the model representation in focusing on specific lesion features. These modules are seamlessly integrated into the encoder for effective multiscale information extraction and fusion, and DDSC is further introduced in the decoder to provide rich representations for precise segmentation. Moreover, the local multi-layer perceptron (LMLP) module is designed to decouple and recalibrate features through a local linear transformation that filters out the noise and refines features to provide edge-enriched representation. Experimental evaluations on two datasets demonstrate that the proposed method achieves Intersection over Union (IoU) scores of 81.54% and 84.83%, respectively, which are on par with state-of-the-art methods, exhibiting its effectiveness for CC segmentation. The proposed method holds great potential in assisting physicians with precise lesion localization and region analysis, thereby improving the diagnostic accuracy of CRC.

Assuntos

Colo , 60670 , Colo/diagnóstico por imagem , Armazenamento e Recuperação da Informação , Redes Neurais de Computação , Semântica , Processamento de Imagem Assistida por Computador

9.

M³YOLOv5: Feature enhanced YOLOv5 model for mandibular fracture detection.

Zhou, Tao; Wang, Hongwei; Du, Yuhu; Liu, Fengzhen; Guo, Yujie; Lu, Huiling.

Comput Biol Med ; 173: 108291, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38522254

RESUMO

BACKGROUND: It is very important to detect mandibular fracture region. However, the size of mandibular fracture region is different due to different anatomical positions, different sites and different degrees of force. It is difficult to locate and recognize fracture region accurately. METHODS: To solve these problems, M3YOLOv5 model is proposed in this paper. Three feature enhancement strategies are designed, which improve the ability of model to locate and recognize mandibular fracture region. Firstly, Global-Local Feature Extraction Module (GLFEM) is designed. By effectively combining Convolutional Neural Network (CNN) and Transformer, the problem of insufficient global information extraction ability of CNN is complemented, and the positioning ability of the model to the fracture region is improved. Secondly, in order to improve the interaction ability of context information, Deep-Shallow Feature Interaction Module (DSFIM) is designed. In this module, the spatial information in the shallow feature layer is embedded to the deep feature layer by the spatial attention mechanism, and the semantic information in the deep feature layer is embedded to the shallow feature layer by the channel attention mechanism. The fracture region recognition ability of the model is improved. Finally, Multi-scale Multi receptive-field Feature Mixing Module (MMFMM) is designed. Deep separate convolution chains are used in this modal, which is composed by multiple layers of different scales and different dilation coefficients. This method provides richer receptive field for the model, and the ability to detect fracture region of different scales is improved. RESULTS: The precision rate, mAP value, recall rate and F1 value of M3YOLOv5 model on mandibular fracture CT data set are 97.18%, 96.86%, 94.42% and 95.58% respectively. The experimental results show that there is better performance about M3YOLOv5 model than the mainstream detection models. CONCLUSION: The M3YOLOv5 model can effectively recognize and locate the mandibular fracture region, which is of great significance for doctors' clinical diagnosis.

Assuntos

Fraturas Mandibulares , Humanos , Fraturas Mandibulares/diagnóstico por imagem , Armazenamento e Recuperação da Informação , Redes Neurais de Computação , Semântica

10.

DNA Bloom Filter enables anti-contamination and file version control for DNA-based data storage.

Li, Yiming; Zhang, Haoling; Chen, Yuxin; Shen, Yue; Ping, Zhi.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38555478

RESUMO

DNA storage is one of the most promising ways for future information storage due to its high data storage density, durable storage time and low maintenance cost. However, errors are inevitable during synthesizing, storing and sequencing. Currently, many error correction algorithms have been developed to ensure accurate information retrieval, but they will decrease storage density or increase computing complexity. Here, we apply the Bloom Filter, a space-efficient probabilistic data structure, to DNA storage to achieve the anti-error, or anti-contamination function. This method only needs the original correct DNA sequences (referred to as target sequences) to produce a corresponding data structure, which will filter out almost all the incorrect sequences (referred to as non-target sequences) during sequencing data analysis. Experimental results demonstrate the universal and efficient filtering capabilities of our method. Furthermore, we employ the Counting Bloom Filter to achieve the file version control function, which significantly reduces synthesis costs when modifying DNA-form files. To achieve cost-efficient file version control function, a modified system based on yin-yang codec is developed.

Assuntos

Algoritmos , DNA , Análise de Sequência de DNA/métodos , DNA/genética , DNA/química , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Armazenamento e Recuperação da Informação

11.

D3EGFR: a webserver for deep learning-guided drug sensitivity prediction and drug response information retrieval for EGFR mutation-driven lung cancer.

Shi, Yulong; Li, Chongwu; Zhang, Xinben; Peng, Cheng; Sun, Peng; Zhang, Qian; Wu, Leilei; Ding, Ying; Xie, Dong; Xu, Zhijian; Zhu, Weiliang.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38555474

RESUMO

As key oncogenic drivers in non-small-cell lung cancer (NSCLC), various mutations in the epidermal growth factor receptor (EGFR) with variable drug sensitivities have been a major obstacle for precision medicine. To achieve clinical-level drug recommendations, a platform for clinical patient case retrieval and reliable drug sensitivity prediction is highly expected. Therefore, we built a database, D3EGFRdb, with the clinicopathologic characteristics and drug responses of 1339 patients with EGFR mutations via literature mining. On the basis of D3EGFRdb, we developed a deep learning-based prediction model, D3EGFRAI, for drug sensitivity prediction of new EGFR mutation-driven NSCLC. Model validations of D3EGFRAI showed a prediction accuracy of 0.81 and 0.85 for patients from D3EGFRdb and our hospitals, respectively. Furthermore, mutation scanning of the crucial residues inside drug-binding pockets, which may occur in the future, was performed to explore their drug sensitivity changes. D3EGFR is the first platform to achieve clinical-level drug response prediction of all approved small molecule drugs for EGFR mutation-driven lung cancer and is freely accessible at https://www.d3pharma.com/D3EGFR/index.php.

Assuntos

Carcinoma Pulmonar de Células não Pequenas , Aprendizado Profundo , Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/genética , Carcinoma Pulmonar de Células não Pequenas/tratamento farmacológico , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma Pulmonar de Células não Pequenas/patologia , Receptores ErbB/genética , Mutação , Armazenamento e Recuperação da Informação

12.

3D-MRI super-resolution reconstruction using multi-modality based on multi-resolution CNN.

Kang, Li; Tang, Bin; Huang, Jianjun; Li, Jianping.

Comput Methods Programs Biomed ; 248: 108110, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38452685

RESUMO

BACKGROUND AND OBJECTIVE: High-resolution (HR) MR images provide rich structural detail to assist physicians in clinical diagnosis and treatment plan. However, it is arduous to acquire HR MRI due to equipment limitations, scanning time or patient comfort. Instead, HR MRI could be obtained through a number of computer assisted post-processing methods that have proven to be effective and reliable. This paper aims to develop a convolutional neural network (CNN) based super-resolution reconstruction framework for low-resolution (LR) T2w images. METHOD: In this paper, we propose a novel multi-modal HR MRI generation framework based on deep learning techniques. Specifically, we construct a CNN based on multi-resolution analysis to learn an end-to-end mapping between LR T2w and HR T2w, where HR T1w is fed into the network to offer detailed a priori information to help generate HR T2w. Furthermore, a low-frequency filtering module is introduced to filter out the interference from HR-T1w during high-frequency information extraction. Based on the idea of multi-resolution analysis, detailed features extracted from HR T1w and LR T2w are fused at two scales in the network and then HR T2w is reconstructed by upsampling and dense connectivity module. RESULTS: Extensive quantitative and qualitative evaluations demonstrate that the proposed method enhances the recovered HR T2w details and outperforms other state-of-the-art methods. In addition, the experimental results also suggest that our network has a lightweight structure and favorable generalization performance. CONCLUSION: The results show that the proposed method is capable of reconstructing HR T2w with higher accuracy. Meanwhile, the super-resolution reconstruction results on other dataset illustrate the excellent generalization ability of the method.

Assuntos

Armazenamento e Recuperação da Informação , Médicos , Humanos , Imageamento por Ressonância Magnética , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador

13.

AssistMED project: Transforming cardiology cohort characterisation from electronic health records through natural language processing - Algorithm design, preliminary results, and field prospects.

Maciejewski, Cezary; Ozieranski, Krzysztof; Barwiolek, Adam; Basza, Mikolaj; Bozym, Aleksandra; Ciurla, Michalina; Janusz Krajsman, Maciej; Maciejewska, Magdalena; Lodzinski, Piotr; Opolski, Grzegorz; Grabowski, Marcin; Cacko, Andrzej; Balsam, Pawel.

Int J Med Inform ; 185: 105380, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38447318

RESUMO

INTRODUCTION: Electronic health records (EHR) are of great value for clinical research. However, EHR consists primarily of unstructured text which must be analysed by a human and coded into a database before data analysis- a time-consuming and costly process limiting research efficiency. Natural language processing (NLP) can facilitate data retrieval from unstructured text. During AssistMED project, we developed a practical, NLP tool that automatically provides comprehensive clinical characteristics of patients from EHR, that is tailored to clinical researchers needs. MATERIAL AND METHODS: AssistMED retrieves patient characteristics regarding clinical conditions, medications with dosage, and echocardiographic parameters with clinically oriented data structure and provides researcher-friendly database output. We validate the algorithm performance against manual data retrieval and provide critical quantitative and qualitative analysis. RESULTS: AssistMED analysed the presence of 56 clinical conditions, medications from 16 drug groups with dosage and 15 numeric echocardiographic parameters in a sample of 400 patients hospitalized in the cardiology unit. No statistically significant differences between algorithm and human retrieval were noted. Qualitative analysis revealed that disagreements with manual annotation were primarily accounted to random algorithm errors, erroneous human annotation and lack of advanced context awareness of our tool. CONCLUSIONS: Current NLP approaches are feasible to acquire accurate and detailed patient characteristics tailored to clinical researchers' needs from EHR. We present an in-depth description of an algorithm development and validation process, discuss obstacles and pinpoint potential solutions, including opportunities arising with recent advancements in the field of NLP, such as large language models.

Assuntos

Cardiologia , Processamento de Linguagem Natural , Humanos , Registros Eletrônicos de Saúde , Algoritmos , Armazenamento e Recuperação da Informação

14.

Concatenated Nanopore DNA Codes.

Vidal, Adrian; Wijekoon, V B; Viterbo, Emanuele.

IEEE Trans Nanobioscience ; 23(2): 310-318, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38546987

RESUMO

In nanopore sequencers, single-stranded DNA molecules (or k-mers) enter a small opening in a membrane called a nanopore and modulate the ionic current through the pore, producing a channel output in the form of a noisy piecewise constant signal. An important problem in DNA-based data storage is finding a set of k-mers, i.e. a DNA code, that is robust against noisy sample duplication introduced by nanopore sequencers. Good DNA codes should contain as many k-mers as possible that produce distinguishable current signals (squiggles) as measured by the sequencer. The dissimilarity between squiggles can be estimated using a bound on their pairwise error probability, which is used as a metric for code design. Unfortunately, code construction using the union bound is limited to small k's due to the difficulty of finding maximum cliques in large graphs. In this paper, we construct large codes by concatenating codewords from a base code, thereby packing more information in a single strand while retaining the storage efficiency of the base code. To facilitate decoding, we include a circumfix in the base code to reduce the effect of the nanopore channel memory. We show that the decoding complexity scales as [Formula: see text], where m is the number of concatenated k-mers. Simulations show that the base code error rate is stable as m increases.

Assuntos

DNA Concatenado , Nanoporos , DNA/genética , Análise de Sequência de DNA , Armazenamento e Recuperação da Informação

15.

Utilizing large language models in breast cancer management: systematic review.

Sorin, Vera; Glicksberg, Benjamin S; Artsi, Yaara; Barash, Yiftach; Konen, Eli; Nadkarni, Girish N; Klang, Eyal.

J Cancer Res Clin Oncol ; 150(3): 140, 2024 Mar 19.

Artigo em Inglês | MEDLINE | ID: mdl-38504034

RESUMO

PURPOSE: Despite advanced technologies in breast cancer management, challenges remain in efficiently interpreting vast clinical data for patient-specific insights. We reviewed the literature on how large language models (LLMs) such as ChatGPT might offer solutions in this field. METHODS: We searched MEDLINE for relevant studies published before December 22, 2023. Keywords included: "large language models", "LLM", "GPT", "ChatGPT", "OpenAI", and "breast". The risk bias was evaluated using the QUADAS-2 tool. RESULTS: Six studies evaluating either ChatGPT-3.5 or GPT-4, met our inclusion criteria. They explored clinical notes analysis, guideline-based question-answering, and patient management recommendations. Accuracy varied between studies, ranging from 50 to 98%. Higher accuracy was seen in structured tasks like information retrieval. Half of the studies used real patient data, adding practical clinical value. Challenges included inconsistent accuracy, dependency on the way questions are posed (prompt-dependency), and in some cases, missing critical clinical information. CONCLUSION: LLMs hold potential in breast cancer care, especially in textual information extraction and guideline-driven clinical question-answering. Yet, their inconsistent accuracy underscores the need for careful validation of these models, and the importance of ongoing supervision.

Assuntos

Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/terapia , Mama , Armazenamento e Recuperação da Informação , Idioma

16.

FedFSA: Hybrid and federated framework for functional status ascertainment across institutions.

Fu, Sunyang; Jia, Heling; Vassilaki, Maria; Keloth, Vipina K; Dang, Yifang; Zhou, Yujia; Garg, Muskan; Petersen, Ronald C; St Sauver, Jennifer; Moon, Sungrim; Wang, Liwei; Wen, Andrew; Li, Fang; Xu, Hua; Tao, Cui; Fan, Jungwei; Liu, Hongfang; Sohn, Sunghwan.

J Biomed Inform ; 152: 104623, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38458578

RESUMO

INTRODUCTION: Patients' functional status assesses their independence in performing activities of daily living, including basic ADLs (bADL), and more complex instrumental activities (iADL). Existing studies have discovered that patients' functional status is a strong predictor of health outcomes, particularly in older adults. Depite their usefulness, much of the functional status information is stored in electronic health records (EHRs) in either semi-structured or free text formats. This indicates the pressing need to leverage computational approaches such as natural language processing (NLP) to accelerate the curation of functional status information. In this study, we introduced FedFSA, a hybrid and federated NLP framework designed to extract functional status information from EHRs across multiple healthcare institutions. METHODS: FedFSA consists of four major components: 1) individual sites (clients) with their private local data, 2) a rule-based information extraction (IE) framework for ADL extraction, 3) a BERT model for functional status impairment classification, and 4) a concept normalizer. The framework was implemented using the OHNLP Backbone for rule-based IE and open-source Flower and PyTorch library for federated BERT components. For gold standard data generation, we carried out corpus annotation to identify functional status-related expressions based on ICF definitions. Four healthcare institutions were included in the study. To assess FedFSA, we evaluated the performance of category- and institution-specific ADL extraction across different experimental designs. RESULTS: ADL extraction performance ranges from an F1-score of 0.907 to 0.986 for bADL and 0.825 to 0.951 for iADL across the four healthcare sites. The performance for ADL extraction with impairment ranges from an F1-score of 0.722 to 0.954 for bADL and 0.674 to 0.813 for iADL across four healthcare sites. For category-specific ADL extraction, laundry and transferring yielded relatively high performance, while dressing, medication, bathing, and continence achieved moderate-high performance. Conversely, food preparation and toileting showed low performance. CONCLUSION: NLP performance varied across ADL categories and healthcare sites. Federated learning using a FedFSA framework performed higher than non-federated learning for impaired ADL extraction at all healthcare sites. Our study demonstrated the potential of the federated learning framework in functional status extraction and impairment classification in EHRs, exemplifying the importance of a large-scale, multi-institutional collaborative development effort.

Assuntos

Atividades Cotidianas , Estado Funcional , Humanos , Idoso , Aprendizagem , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural

17.

Design and development of patient health tracking, monitoring and big data storage using Internet of Things and real time cloud computing.

Shafi, Imran; Din, Sadia; Farooq, Siddique; Díez, Isabel de la Torre; Breñosa, Jose; Espinosa, Julio César Martínez; Ashraf, Imran.

PLoS One ; 19(3): e0298582, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38466691

RESUMO

With the outbreak of the COVID-19 pandemic, social isolation and quarantine have become commonplace across the world. IoT health monitoring solutions eliminate the need for regular doctor visits and interactions among patients and medical personnel. Many patients in wards or intensive care units require continuous monitoring of their health. Continuous patient monitoring is a hectic practice in hospitals with limited staff; in a pandemic situation like COVID-19, it becomes much more difficult practice when hospitals are working at full capacity and there is still a risk of medical workers being infected. In this study, we propose an Internet of Things (IoT)-based patient health monitoring system that collects real-time data on important health indicators such as pulse rate, blood oxygen saturation, and body temperature but can be expanded to include more parameters. Our system is comprised of a hardware component that collects and transmits data from sensors to a cloud-based storage system, where it can be accessed and analyzed by healthcare specialists. The ESP-32 microcontroller interfaces with the multiple sensors and wirelessly transmits the collected data to the cloud storage system. A pulse oximeter is utilized in our system to measure blood oxygen saturation and body temperature, as well as a heart rate monitor to measure pulse rate. A web-based interface is also implemented, allowing healthcare practitioners to access and visualize the collected data in real-time, making remote patient monitoring easier. Overall, our IoT-based patient health monitoring system represents a significant advancement in remote patient monitoring, allowing healthcare practitioners to access real-time data on important health metrics and detect potential health issues before they escalate.

Assuntos

Computação em Nuvem , Internet das Coisas , Humanos , Pandemias , Monitorização Fisiológica , Armazenamento e Recuperação da Informação

18.

Data retrieval from archival renal biopsies using nonlinear microscopy.

Cahill, Lucas C; Yoshitake, Tadayuki; Rosen, Milan; Weber, Timothy D; Fujimoto, James G; Rosen, Seymour.

PLoS One ; 19(3): e0299506, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38489324

RESUMO

Thorough examination of renal biopsies may improve understanding of renal disease. Imaging of renal biopsies with fluorescence nonlinear microscopy (NLM) and optical clearing enables three-dimensional (3D) visualization of pathology without microtome sectioning. Archival renal paraffin blocks from 12 patients were deparaffinized and stained with Hoechst and Eosin for fluorescent nuclear and cytoplasmic/stromal contrast, then optically cleared using benzyl alcohol benzyl benzoate (BABB). NLM images of entire biopsy fragments (thickness range 88-660 µm) were acquired using NLM with fluorescent signals mapped to an H&E color scale. Cysts, glomeruli, exudative lesions, and Kimmelstiel-Wilson nodules were segmented in 3D and their volumes, diameters, and percent composition could be obtained. The glomerular count on 3D NLM volumes was high indicating that archival blocks could be a vast tissue resource to enable larger-scale retrospective studies. Rapid optical clearing and NLM imaging enables more thorough biopsy examination and is a promising technique for analysis of archival paraffin blocks.

Assuntos

Corantes , Parafina , Humanos , Estudos Retrospectivos , Microscopia de Fluorescência , Biópsia , Armazenamento e Recuperação da Informação , Imageamento Tridimensional/métodos , Microscopia Confocal

19.

Comparison of attribute-based encryption schemes in securing healthcare systems.

Walid, Redwan; Joshi, Karuna Pande; Choi, Seung Geol.

Sci Rep ; 14(1): 7147, 2024 03 26.

Artigo em Inglês | MEDLINE | ID: mdl-38532119

RESUMO

E-health has become a top priority for healthcare organizations focused on advancing healthcare services. Thus, medical organizations have been widely adopting cloud services, resulting in the effective storage of sensitive data. To prevent privacy and security issues associated with the data, attribute-based encryption (ABE) has been a popular choice for encrypting private data. Likewise, the attribute-based access control (ABAC) technique has been widely adopted for controlling data access. Researchers have proposed electronic health record (EHR) systems using ABE techniques like ciphertext policy attribute-based encryption (CP-ABE), key policy attribute-based encryption (KP-ABE), and multi authority attribute-based encryption (MA-ABE). However, there is a lack of rigorous comparison among the various ABE schemes used in healthcare systems. To better understand the usability of ABE techniques in medical systems, we performed a comprehensive review and evaluation of the three popular ABE techniques by developing EHR systems using knowledge graphs with the same data but different encryption mechanisms. We have used the MIMIC-III dataset with varying record sizes for this study. This paper can help healthcare organizations or researchers using ABE in their systems to comprehend the correct usage scenario and the prospect of ABE deployment in the most recent technological evolution.

Assuntos

Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , Algoritmos , Segurança Computacional , Computação em Nuvem , Atenção à Saúde

20.

How Does ChatGPT Use Source Information Compared With Google? A Text Network Analysis of Online Health Information.

Shen, Oscar Y; Pratap, Jayanth S; Li, Xiang; Chen, Neal C; Bhashyam, Abhiram R.

Clin Orthop Relat Res ; 482(4): 578-588, 2024 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-38517757

RESUMO

BACKGROUND: The lay public is increasingly using ChatGPT (a large language model) as a source of medical information. Traditional search engines such as Google provide several distinct responses to each search query and indicate the source for each response, but ChatGPT provides responses in paragraph form in prose without providing the sources used, which makes it difficult or impossible to ascertain whether those sources are reliable. One practical method to infer the sources used by ChatGPT is text network analysis. By understanding how ChatGPT uses source information in relation to traditional search engines, physicians and physician organizations can better counsel patients on the use of this new tool. QUESTIONS/PURPOSES: (1) In terms of key content words, how similar are ChatGPT and Google Search responses for queries related to topics in orthopaedic surgery? (2) Does the source distribution (academic, governmental, commercial, or form of a scientific manuscript) differ for Google Search responses based on the topic's level of medical consensus, and how is this reflected in the text similarity between ChatGPT and Google Search responses? (3) Do these results vary between different versions of ChatGPT? METHODS: We evaluated three search queries relating to orthopaedic conditions: "What is the cause of carpal tunnel syndrome?," "What is the cause of tennis elbow?," and "Platelet-rich plasma for thumb arthritis?" These were selected because of their relatively high, medium, and low consensus in the medical evidence, respectively. Each question was posed to ChatGPT version 3.5 and version 4.0 20 times for a total of 120 responses. Text network analysis using term frequency-inverse document frequency (TF-IDF) was used to compare text similarity between responses from ChatGPT and Google Search. In the field of information retrieval, TF-IDF is a weighted statistical measure of the importance of a key word to a document in a collection of documents. Higher TF-IDF scores indicate greater similarity between two sources. TF-IDF scores are most often used to compare and rank the text similarity of documents. Using this type of text network analysis, text similarity between ChatGPT and Google Search can be determined by calculating and summing the TF-IDF for all keywords in a ChatGPT response and comparing it with each Google search result to assess their text similarity to each other. In this way, text similarity can be used to infer relative content similarity. To answer our first question, we characterized the text similarity between ChatGPT and Google Search responses by finding the TF-IDF scores of the ChatGPT response and each of the 20 Google Search results for each question. Using these scores, we could compare the similarity of each ChatGPT response to the Google Search results. To provide a reference point for interpreting TF-IDF values, we generated randomized text samples with the same term distribution as the Google Search results. By comparing ChatGPT TF-IDF to the random text sample, we could assess whether TF-IDF values were statistically significant from TF-IDF values obtained by random chance, and it allowed us to test whether text similarity was an appropriate quantitative statistical measure of relative content similarity. To answer our second question, we classified the Google Search results to better understand sourcing. Google Search provides 20 or more distinct sources of information, but ChatGPT gives only a single prose paragraph in response to each query. So, to answer this question, we used TF-IDF to ascertain whether the ChatGPT response was principally driven by one of four source categories: academic, government, commercial, or material that took the form of a scientific manuscript but was not peer-reviewed or indexed on a government site (such as PubMed). We then compared the TF-IDF similarity between ChatGPT responses and the source category. To answer our third research question, we repeated both analyses and compared the results when using ChatGPT 3.5 versus ChatGPT 4.0. RESULTS: The ChatGPT response was dominated by the top Google Search result. For example, for carpal tunnel syndrome, the top result was an academic website with a mean TF-IDF of 7.2. A similar result was observed for the other search topics. To provide a reference point for interpreting TF-IDF values, a randomly generated sample of text compared with Google Search would have a mean TF-IDF of 2.7 ± 1.9, controlling for text length and keyword distribution. The observed TF-IDF distribution was higher for ChatGPT responses than for random text samples, supporting the claim that keyword text similarity is a measure of relative content similarity. When comparing source distribution, the ChatGPT response was most similar to the most common source category from Google Search. For the subject where there was strong consensus (carpal tunnel syndrome), the ChatGPT response was most similar to high-quality academic sources rather than lower-quality commercial sources (TF-IDF 8.6 versus 2.2). For topics with low consensus, the ChatGPT response paralleled lower-quality commercial websites compared with higher-quality academic websites (TF-IDF 14.6 versus 0.2). ChatGPT 4.0 had higher text similarity to Google Search results than ChatGPT 3.5 (mean increase in TF-IDF similarity of 0.80 to 0.91; p < 0.001). The ChatGPT 4.0 response was still dominated by the top Google Search result and reflected the most common search category for all search topics. CONCLUSION: ChatGPT responses are similar to individual Google Search results for queries related to orthopaedic surgery, but the distribution of source information can vary substantially based on the relative level of consensus on a topic. For example, for carpal tunnel syndrome, where there is widely accepted medical consensus, ChatGPT responses had higher similarity to academic sources and therefore used those sources more. When fewer academic or government sources are available, especially in our search related to platelet-rich plasma, ChatGPT appears to have relied more heavily on a small number of nonacademic sources. These findings persisted even as ChatGPT was updated from version 3.5 to version 4.0. CLINICAL RELEVANCE: Physicians should be aware that ChatGPT and Google likely use the same sources for a specific question. The main difference is that ChatGPT can draw upon multiple sources to create one aggregate response, while Google maintains its distinctness by providing multiple results. For topics with a low consensus and therefore a low number of quality sources, there is a much higher chance that ChatGPT will use less-reliable sources, in which case physicians should take the time to educate patients on the topic or provide resources that give more reliable information. Physician organizations should make it clear when the evidence is limited so that ChatGPT can reflect the lack of quality information or evidence.

Assuntos

Síndrome do Túnel Carpal , Ferramenta de Busca , Humanos , Armazenamento e Recuperação da Informação

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA